Import the Python API module and Instantiate the GIS object

Import the Python API



In [1]:

    
import arcgis

Create an GIS object instance using the account currently logged in through ArcGIS Pro



In [2]:

    
gis_retail = arcgis.gis.GIS('Pro')

Get a Feature Set, data to work with, from the Web GIS Item ID

Create a Web GIS Item instance using the Item ID



In [3]:

    
trade_area_itemid = 'bf361f9081fd43a7ba57357e74ccc373'

item = arcgis.gis.Item(gis=gis_retail, itemid=trade_area_itemid)
item









    Out[3]:





                    
                       
                        
                       
                    

                    
                        Drive_Time_Trade_Areas
                        
                        
Drive time trade areas with significant data attached for analyzing through machine learning.Feature Layer Collection by joel5174@esri.com_commteamretail
                        
Last Modified: August 02, 2017
                        
0 comments, 3 views

Since the item only contains one feature layer, get the first layer in the item, the Feature Layer we need to work with.



In [4]:

    
feature_layer = item.layers[0]
feature_layer









    Out[4]:





<FeatureLayer url:"https://services.arcgis.com/PMTtzuTB6WiPuNSv/arcgis/rest/services/Drive_Time_Trade_Areas/FeatureServer/0">

Now, for this initial analysis, query to return just the attributes for the eight minute trade areas as a Feature Set.



In [5]:

    
feature_set = feature_layer.query(where="AREA_DESC = '0 - 8 minutes'", returnGeometry=False)

Convert the Data into a Pandas Data Frame

Take advantage of the df function on the Feature set object returned from the query to convert the data to a Pandas Data Frame.



In [6]:

    
data_frame = feature_set.df
data_frame.head()









    Out[6]:







  
    
      
      AGGDI_CY
      AGGNW_CY
      AMERIND_CY
      AREA_DESC
      AREA_ID
      ASIAN_CY
      ASSCDEG_CY
      AVGDI_CY
      AVGFMSZ_CY
      AVGHHSZ_CY
      ...
      VAL1M_CY
      VAL200K_CY
      VAL250K_CY
      VAL300K_CY
      VAL400K_CY
      VAL500K_CY
      VAL50K_CY
      VAL750K_CY
      WHITE_CY
      WIDOWED_CY
    
    
      OBJECTID
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      3
      551906518
      2390086587
      594
      0 - 8 minutes
      897225868_1
      1432
      1669
      51008
      4.33
      3.96
      ...
      157
      487
      514
      1015
      938
      1290
      186
      223
      19872
      1377
    
    
      6
      6247270524
      70627761591
      1412
      0 - 8 minutes
      404768898_1
      42842
      11671
      92202
      3.30
      2.87
      ...
      4720
      472
      723
      2195
      4173
      19341
      543
      11822
      113028
      7292
    
    
      9
      3033117454
      26795808683
      1132
      0 - 8 minutes
      705003717_1
      36384
      6982
      83499
      3.65
      3.31
      ...
      871
      343
      653
      2295
      4218
      11458
      733
      2607
      51285
      3382
    
    
      12
      2954490789
      12560995634
      1413
      0 - 8 minutes
      427515311_1
      22184
      4944
      66758
      3.25
      2.52
      ...
      1117
      227
      334
      1598
      2642
      5951
      137
      2150
      58040
      3501
    
    
      15
      2048112658
      7503591356
      1168
      0 - 8 minutes
      435682132_1
      17313
      3624
      62439
      3.42
      2.65
      ...
      587
      178
      314
      1410
      1888
      3907
      78
      1312
      42363
      2523
    
  

5 rows × 162 columns

Save dependent and independent variable names as Python variables

Use a quick list comprehension to create a list of field names to be used as independent variables.



In [7]:

    
field_name_independent_list = [field['name'] for field in feature_set.fields if 
    field['type'] != 'esriFieldTypeOID' and  # we don't need the Esri object identifier field
    field['name'].startswith('Shape_') == False and  # exclude the Esri shape fields
    field['type'] == 'esriFieldTypeDouble' and  # ensure numeric, quantatative, fields are the only fields used
    field['name'] != 'STORE_LAT' and  # while numeric, the fields describing the location are not independent varaibles
    field['name'] != 'STORE_LONG' and  # while numeric, the fields describing the location are not independent varaibles
    field['name'] != 'SALESVOL'  # exclude the dependent variable
]
print(field_name_independent_list)









    



['TOTPOP_CY', 'HHPOP_CY', 'FAMPOP_CY', 'GQPOP_CY', 'POPDENS_CY', 'TOTHH_CY', 'AVGHHSZ_CY', 'FAMHH_CY', 'AVGFMSZ_CY', 'TOTHU_CY', 'OWNER_CY', 'RENTER_CY', 'VACANT_CY', 'POPGRW10CY', 'HHGRW10CY', 'FAMGRW10CY', 'NOHS_CY', 'SOMEHS_CY', 'HSGRAD_CY', 'GED_CY', 'SMCOLL_CY', 'ASSCDEG_CY', 'BACHDEG_CY', 'GRADDEG_CY', 'NEVMARR_CY', 'MARRIED_CY', 'WIDOWED_CY', 'DIVORCD_CY', 'WHITE_CY', 'BLACK_CY', 'AMERIND_CY', 'ASIAN_CY', 'PACIFIC_CY', 'OTHRACE_CY', 'RACE2UP_CY', 'HISPPOP_CY', 'HISPWHT_CY', 'HISPBLK_CY', 'HISPAI_CY', 'HISPASN_CY', 'HISPPI_CY', 'HISPOTH_CY', 'HISPMLT_CY', 'NONHISP_CY', 'NHSPWHT_CY', 'NHSPBLK_CY', 'NHSPAI_CY', 'NHSPASN_CY', 'NHSPPI_CY', 'NHSPOTH_CY', 'NHSPMLT_CY', 'MINORITYCY', 'DIVINDX_CY', 'HINC0_CY', 'HINC15_CY', 'HINC25_CY', 'HINC35_CY', 'HINC50_CY', 'HINC75_CY', 'HINC100_CY', 'HINC150_CY', 'HINC200_CY', 'MEDHINC_CY', 'AVGHINC_CY', 'PCI_CY', 'DI0_CY', 'DI15_CY', 'DI25_CY', 'DI35_CY', 'DI50_CY', 'DI75_CY', 'DI100_CY', 'DI150_CY', 'DI200_CY', 'AGGDI_CY', 'MEDDI_CY', 'AVGDI_CY', 'NW0_CY', 'NW15_CY', 'NW35_CY', 'NW50_CY', 'NW75_CY', 'NW100_CY', 'NW150_CY', 'NW250_CY', 'NW500_CY', 'AGGNW_CY', 'MEDNW_CY', 'AVGNW_CY', 'VAL0_CY', 'VAL50K_CY', 'VAL100K_CY', 'VAL150K_CY', 'VAL200K_CY', 'VAL250K_CY', 'VAL300K_CY', 'VAL400K_CY', 'VAL500K_CY', 'VAL750K_CY', 'VAL1M_CY', 'MEDVAL_CY', 'AVGVAL_CY', 'CIVLBFR_CY', 'EMP_CY', 'INDAGRI_CY', 'INDMIN_CY', 'INDCONS_CY', 'INDMANU_CY', 'INDWHTR_CY', 'INDRTTR_CY', 'INDTRAN_CY', 'INDUTIL_CY', 'INDINFO_CY', 'INDFIN_CY', 'INDRE_CY', 'INDTECH_CY', 'INDMGMT_CY', 'INDADMN_CY', 'INDEDUC_CY', 'INDHLTH_CY', 'INDARTS_CY', 'INDFOOD_CY', 'INDOTSV_CY', 'INDPUBL_CY', 'UNEMP_CY', 'UNEMPRT_CY', 'OCCMGMT_CY', 'OCCBUS_CY', 'OCCCOMP_CY', 'OCCARCH_CY', 'OCCSSCI_CY', 'OCCSSRV_CY', 'OCCLEGL_CY', 'OCCEDUC_CY', 'OCCENT_CY', 'OCCHTCH_CY', 'OCCHLTH_CY', 'OCCPROT_CY', 'OCCFOOD_CY', 'OCCBLDG_CY', 'OCCPERS_CY', 'OCCSALE_CY', 'OCCADMN_CY', 'OCCFARM_CY', 'OCCCONS_CY', 'OCCMAIN_CY', 'OCCPROD_CY', 'OCCTRAN_CY']

Also, save the name of the dependent variable field as well.



In [8]:

    
field_name_dependent = 'SALESVOL'

Now What?

This is where I am now in over my head. Initially, my thought for this first stab is to create four to six store segments using SciKit learn, and identifying the defining attributes of each. This addresses the need of stores to identify store segments for customized assortment planning. Hence, although I do not know exactly how to accomplish this, I can discuss the business case quite easily.



In [ ]:

	AGGDI_CY	AGGNW_CY	AMERIND_CY	AREA_DESC	AREA_ID	ASIAN_CY	ASSCDEG_CY	AVGDI_CY	AVGFMSZ_CY	AVGHHSZ_CY	...	VAL1M_CY	VAL200K_CY	VAL250K_CY	VAL300K_CY	VAL400K_CY	VAL500K_CY	VAL50K_CY	VAL750K_CY	WHITE_CY	WIDOWED_CY
OBJECTID
3	551906518	2390086587	594	0 - 8 minutes	897225868_1	1432	1669	51008	4.33	3.96	...	157	487	514	1015	938	1290	186	223	19872	1377
6	6247270524	70627761591	1412	0 - 8 minutes	404768898_1	42842	11671	92202	3.30	2.87	...	4720	472	723	2195	4173	19341	543	11822	113028	7292
9	3033117454	26795808683	1132	0 - 8 minutes	705003717_1	36384	6982	83499	3.65	3.31	...	871	343	653	2295	4218	11458	733	2607	51285	3382
12	2954490789	12560995634	1413	0 - 8 minutes	427515311_1	22184	4944	66758	3.25	2.52	...	1117	227	334	1598	2642	5951	137	2150	58040	3501
15	2048112658	7503591356	1168	0 - 8 minutes	435682132_1	17313	3624	62439	3.42	2.65	...	587	178	314	1410	1888	3907	78	1312	42363	2523